[python-package] remove unnecessary files to reduce sdist size (fixes #6560) #6565
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR fixes #6560.
Right now, the packages at https://pypi.org/project/xgboost/#files include some files like documentation and tests that I think could be safely removed. This PR proposes making the rules in
MANIFEST.in
more specific, to reduce the size of distributions of the Python package.master
how I calculated these sizes
You can check the list of files to be included by running the following:
cd python-package python setup.py sdist cat xgboost.egg-info/SOURCES.txt
included files as of this file
How this improves
xgboost
This is valuable for storage-sensitive environments, like AWS Lambda. See my comment at pandas-dev/pandas#30741 (comment) for more explanation of that.
Reducing the package size can also help people who have slow download speeds, which I think is even more important now than it was in the past because the new
pip
resolver downloads source distributions (possibly many versions for one package) while resolving dependencies (pypa/pip#9187).Thanks for your time and consideration!